TEXT CLUSTERING POWERED BY SEMANTICO-SYNTACTIC FEATURES
Annotation
Subject of Research. The performed study is devoted to improvement of the text clustering quality indicators. The main attention is paid to the feature extraction that describes the mathematical model of the texts. The k-means method is used for clustering of the resulting vector representation of the texts. Method. An analytical approach was proposed based on the use of semanticosyntactic features of the clustered texts. Feature extraction was performed using the Stanford CoreNLP Toolkit. Some links between the words of the texts in “Enhanced ++ Dependencies” representation were encoded together with the words connecting them. The values of semantico-syntactic features were calculated based on the frequencies of encoded links in the texts. Main Results. An experiment has shown that by comparison of the quality indicators of a prototype developed on the basis of the proposed method and a clustering system based on statistical features, the proposed method application provides for decrease in the number of clustering errors by more than 15 %. Practical Relevance. Pre-training is not required to obtain semanticosyntactic features of the texts. Therefore, the proposed approach can be used to improve clustering quality indicators in the absence of large text corpuses, which are necessary for pre-training of statistical language models based on word embeddings.
Keywords
Постоянный URL
Articles in current issue
- СOMPARISON OF ON-LINE AND OFF-LINE FRIED PARAMETER ESTIMATION METHODS
- INTERFERENCE OF MULTI-MODE WEAK COHERENT STATES FOR TWIN-FIELD QUANTUM COMMUNICATION APPLICATIONS
- FIBER COUPLED LASER DIODE MODULE ALIGNMENT
- EFFECT OF OXIDE ADDITIVES ON UP-CONVERSION LUMINESCENCE OF ERBIUM IONS IN ALKALINE GERMANATE GLASSES
- MINIMIZATION OF NOISE FLOOR LEVEL OF FIBER-OPTIC INTERFEROMETRIC SENSOR ARRAY BY ADJUSTMENT OF OPERATIONAL AMPLIFIER CASCADE PARAMETERS
- ASSESSMENT OF CUTANEOUS BLOOD FLOW IN LOWER EXTREMITIES BY IMAGING PHOTOPLETHYSMOGRAPHY METHOD
- DIGITAL IMAGE ABERRATION CORRECTION TECHNIQUE FOR STRUCTURED ILLUMINATION MICROSCOPY
- THREE-DIMENSIONAL SIMULATION OF VOLUME PICTORIAL HOLOGRAM BY PHOTOGRAMMETRY METHOD
- OPTIMIZATION TECHNIQUES APPLIED TO INITIAL DESIGNS OF ULTRAVIOLET LITHOGRAPHIC OBJECTIVE
- NONDESTRUCTIVE EXPOSURE OF DIRECTED OPTICAL RADIATION ON DEVICES WITH LIGHT-SENSITIVE SENSORS
- ROBUST DYNAMICAL FEEDBACK DESIGN FOR BALLPOSITION CONTROL ON ROTARY PLATFORM
- MORPHOLOGY AND OPTICAL PROPERTIES OF AlN FILMS ON SAPPHIRE
- SIMULATION MODEL OF REDUNDANT MACHINE-TO-MACHINE EXCHANGE WITH ORGANIZATION OF QUEUES FOR ACCESS TO AGGREGATED CHANNELS
- SEA SURFACE IMAGE SUB-BAND ANALYSIS BASED ON COSINE TRANSFORM
- INTEGRATED ENVIRONMENT ARCHITECTURE FOR SOFTWARE DEVELOPMENT WITH STRUCTURED EDITING SUPPORT
IMPLEMENTATION OF AGENT INTERACTION PROTOCOL WITHIN CLOUD INFRASTRUCTURE IN GEOGRAPHICALLY DISTRIBUTED DATA CENTERS
- IDENTIFICATION OF EQUIPMENT DEGRADATION PHASE IN PREVENTATIVE MAINTENANCE SYSTEMS
- ENERGY-BASED ANALYSIS OF BIOINSPIRED MECHANISM FOR CHEETAH ROBOT LEG
- MODELING OF RESONANCE EFFECTS IN SPINE WITH ADDITIONAL FIXING ELEMENTS
- SIDE-CHANNEL INFORMATION LEAK DETECTION WITH WAVELET TRANSFORMATION
- SIMULATION OF UNORGANIZED GROUP BEHAVIOR IN CASE OF EMERGENCY
- EFFICIENCY RESEARCH OF SIGNAL RECOVERY ALGORITHMS WITH LONG GAPS AND RARE ARRIVAL OF MEASUREMENTS
- MODELING OF WIRELESS NETWORKS IN OMNET ++ ENVIRONMENT INVOLVING INET FRAMEWORK
- LEARNING ENVIRONMENT DESIGN USING ETHEREUM BLOCKCHAIN SMART CONTRACTS
- WEB PORTALS FOR MANAGEMENT OF CLOUD SERVICES WITHIN DATA CENTRES